247 research outputs found
Efficient Algorithms for Scheduling Moldable Tasks
We study the problem of scheduling independent moldable tasks on
processors that arises in large-scale parallel computations. When tasks are
monotonic, the best known result is a -approximation
algorithm for makespan minimization with a complexity linear in and
polynomial in and where is
arbitrarily small. We propose a new perspective of the existing speedup models:
the speedup of a task is linear when the number of assigned
processors is small (up to a threshold ) while it presents
monotonicity when ranges in ; the bound
indicates an unacceptable overhead when parallelizing on too many processors.
For a given integer , let . In this paper, we propose a -approximation algorithm for makespan minimization with a
complexity where
(). As
a by-product, we also propose a -approximation algorithm for
throughput maximization with a common deadline with a complexity
A Game-Theoretic Study on Non-Monetary Incentives in Data Analytics Projects with Privacy Implications
The amount of personal information contributed by individuals to digital
repositories such as social network sites has grown substantially. The
existence of this data offers unprecedented opportunities for data analytics
research in various domains of societal importance including medicine and
public policy. The results of these analyses can be considered a public good
which benefits data contributors as well as individuals who are not making
their data available. At the same time, the release of personal information
carries perceived and actual privacy risks to the contributors. Our research
addresses this problem area. In our work, we study a game-theoretic model in
which individuals take control over participation in data analytics projects in
two ways: 1) individuals can contribute data at a self-chosen level of
precision, and 2) individuals can decide whether they want to contribute at all
(or not). From the analyst's perspective, we investigate to which degree the
research analyst has flexibility to set requirements for data precision, so
that individuals are still willing to contribute to the project, and the
quality of the estimation improves. We study this tradeoff scenario for
populations of homogeneous and heterogeneous individuals, and determine Nash
equilibria that reflect the optimal level of participation and precision of
contributions. We further prove that the analyst can substantially increase the
accuracy of the analysis by imposing a lower bound on the precision of the data
that users can reveal
Linear Regression from Strategic Data Sources
Linear regression is a fundamental building block of statistical data
analysis. It amounts to estimating the parameters of a linear model that maps
input features to corresponding outputs. In the classical setting where the
precision of each data point is fixed, the famous Aitken/Gauss-Markov theorem
in statistics states that generalized least squares (GLS) is a so-called "Best
Linear Unbiased Estimator" (BLUE). In modern data science, however, one often
faces strategic data sources, namely, individuals who incur a cost for
providing high-precision data.
In this paper, we study a setting in which features are public but
individuals choose the precision of the outputs they reveal to an analyst. We
assume that the analyst performs linear regression on this dataset, and
individuals benefit from the outcome of this estimation. We model this scenario
as a game where individuals minimize a cost comprising two components: (a) an
(agent-specific) disclosure cost for providing high-precision data; and (b) a
(global) estimation cost representing the inaccuracy in the linear model
estimate. In this game, the linear model estimate is a public good that
benefits all individuals. We establish that this game has a unique non-trivial
Nash equilibrium. We study the efficiency of this equilibrium and we prove
tight bounds on the price of stability for a large class of disclosure and
estimation costs. Finally, we study the estimator accuracy achieved at
equilibrium. We show that, in general, Aitken's theorem does not hold under
strategic data sources, though it does hold if individuals have identical
disclosure costs (up to a multiplicative factor). When individuals have
non-identical costs, we derive a bound on the improvement of the equilibrium
estimation cost that can be achieved by deviating from GLS, under mild
assumptions on the disclosure cost functions.Comment: This version (v3) extends the results on the sub-optimality of GLS
(Section 6) and improves writing in multiple places compared to v2. Compared
to the initial version v1, it also fixes an error in Theorem 6 (now Theorem
5), and extended many of the result
Dynamic Resource Management in Clouds: A Probabilistic Approach
Dynamic resource management has become an active area of research in the
Cloud Computing paradigm. Cost of resources varies significantly depending on
configuration for using them. Hence efficient management of resources is of
prime interest to both Cloud Providers and Cloud Users. In this work we suggest
a probabilistic resource provisioning approach that can be exploited as the
input of a dynamic resource management scheme. Using a Video on Demand use case
to justify our claims, we propose an analytical model inspired from standard
models developed for epidemiology spreading, to represent sudden and intense
workload variations. We show that the resulting model verifies a Large
Deviation Principle that statistically characterizes extreme rare events, such
as the ones produced by "buzz/flash crowd effects" that may cause workload
overflow in the VoD context. This analysis provides valuable insight on
expectable abnormal behaviors of systems. We exploit the information obtained
using the Large Deviation Principle for the proposed Video on Demand use-case
for defining policies (Service Level Agreements). We believe these policies for
elastic resource provisioning and usage may be of some interest to all
stakeholders in the emerging context of cloud networkingComment: IEICE Transactions on Communications (2012). arXiv admin note:
substantial text overlap with arXiv:1209.515
- …